The Full-text Multilingual Corpus: Breaking the Translation Memory Bottleneck

نویسنده

  • Daniel Gervais
چکیده

Driven by fast-paced global competition where the time-to-market of new products, services and communications into multiple languages and cultures is mission-critical, organizations are increasingly demanding translation services that provide faster turnaround while maintaining the highest level of quality. A key driver behind the need for speed and quality is the ongoing explosion of web-based content and the related expectations of content freshness and quality. Operating in a competitive and typically fixed-price environment, translation service providers need to respond with significant gains in translator productivity while continuously improving translation quality. Also, translators and terminologists do not work in isolation they are members of a complex language management value chain that consists of monolingual and multilingual authors, reviewers, translators, terminologists, and content consumers who may reside in multiple organizations. In order to fully optimize language management, all of these participants must be able to seamlessly share language resources and collaborate in real-time.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Processing Annotated TMX Parallel Corpora

In the later years the amount of freely available multilingual corpora has grown in an exponential way. Unfortunately the way these corpora are made available is very diverse, ranging from simple text files or specific XML schemas to supposedly standard formats like the XML Corpus Encoding Initiative, the Text Encoding Initiative, or even the Translation Memory Exchange formats. In this documen...

متن کامل

Using Parallel Corpora for Word Sense Disambiguation

Word Sense Disambiguation (WSD) is the Natural Language Processing (NLP) task that consists in selecting the correct sense of a polysemous word in a given context. Most state-of-the-art WSD systems are supervised classifiers that are trained on manually sense-tagged corpora, which are very time-consuming and expensive to build. In order to overcome this acquisition bottleneck (sense-tagged corp...

متن کامل

Building a Multilingual Named Entity-Annotated Corpus Using Annotation Projection

As developers of a highly multilingual named entity recognition (NER) system, we face an evaluation resource bottleneck problem: we need evaluation data in many languages, the annotation should not be too time-consuming, and the evaluation results across languages should be comparable. We solve the problem by automatically annotating the English version of a multi-parallel corpus and by project...

متن کامل

Active Learning for Multilingual Statistical Machine Translation

Statistical machine translation (SMT) models require bilingual corpora for training, and these corpora are often multilingual with parallel text in multiple languages simultaneously. We introduce an active learning task of adding a new language to an existing multilingual set of parallel text and constructing high quality MT systems, from each language in the collection into this new target lan...

متن کامل

Multilingual Corpora - Current Practice and Future Trends

In this paper I would like to give an overview of multilingual corpus building to date. In doing so, I will review two types of multilingual corpus, parallel and translation corpora. Following this, I will consider what tools are currently available which allow for the exploitation of such corpora in the context of machine/machine aided translation. Throughout I will give a fairly global view o...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010